Available Technology for Discovering Causal Models, Building Bayes Nets, and Selecting Predictors: The TETRAD II Program

نویسنده

  • Clark Glymour
چکیده

This paper describes the facilities available for knowledge discovery in databases using the TETRAD II program. While a year or two shy of state of the most advanced research on discovery, we believe this program provides the most flexible and reliable suite of procedures so far availabIe commercially for discovering causal structure, semiautomatically constructing Bayes networks, estimating ~nmmntnrr in c.w.k n‘akmrb. nnrl ,*nAa+;nn l-h,. ntrrnrsm lrnn ~SuauIIWLY L U”“,. ,...C”“APIY, YXLU “yvU~“qj. ..I” ya”~‘cu” YUll also be used to red&e the number of variables needed for classification or prediction, for example as a neural net preproceesor. The theoretical principles on which the program is based are described in detail in Spirtes, Glymour and Scheines (1993). Under assumptions described there, each of the search and discovery procedures we will describe have been proved to give correct information when statistical decisions are made correctly.’ 1. What Does TETRAD Do? This paper describes the facilities available for howledge discovery in databases using the TETRAD II program. While a year or two shy of state of the most advanced research on discovery, we believe this program provides the most flexible and reliable suite of procedures so far available commercially for discovering causal structure, semiautomatically constructing Bayes networks, estimating parameters in such networks, and updating. The theoretical principles on which the program is based are described in detail in Spirtes, Glymour and Scheines (1993). Under assumptions described there, each of the search and discovery procedures we will describe have been proved to give correct information when statistical decisions about independence and conditional independence ’ Research supported by the Navy Office of Personnel Research and Development, and the Office of Naval Research, contract N00014-93-1-0568. This work is a collaboration with C. Meek, R. Scheines and P. Spirtes. have correct outcomes correct in the population distribution. Each of the procedures has also been extensively tested on simulated data samples of realistic sizes .The program includes: l A module (BUILD) that combine the user’s lolowledge about the system under study with principles for extracting causal structure from statistical patterns for data sets with continuous variables, or for data sets with disaete variables. The procedure contains a switch that permits the user to assume, or not, that no latent variables are present. l Functions that indicate when two or more measuced variables may all be influenced by an unmeasured common cause. l A module (ESTIMATE) that gives maximum likelihood estimates for parameters in statistical models describing influences between measured discrete variables. With discrete data and a little help from the user, BUILD and ESTIMATE will construct a fully parameterized Bayes network for a domain. l A module (UPDATE) that updates a fully parameterized Bayes network to make predictions about any of the properties of a new unit or example from information about some of the properties in that unit. l A module (PURIFY) that takes a raw data or a covariance matrix for normally distributed variables the user assumes to have at least one unmeasured common cause and finds a subset of variables that have exactly one unmeasured common cause and no other causal relations with one another. l A Module (MlMbuild) that determines structural dependencies among latent variables given correlational d&& a.tl_ nwified meanm3ment mnMn. ~---___----_ __-_-_ __-----. l A module that prepares input files for other estimation and testing packages (EQS, LISREL, CALIS) for linear models. 130 KDD-95 From: KDD-95 Proceedings. Copyright © 1995, AAAI (www.aaai.org). All rights reserved. l A module (MONTE) that allows the user to generate simulated data for a wide variety of causal models. The TETRAD II program does not do routine data cleanup task--checks for outliers, variable transformations to approximate normal distributions, etc. Neither does it do model diagnostics of the kind performed by many readily available statistical packages such as M INITAB, S AS, BMDP or SYSTAT. We recommend that where possible checks and adjustments of the data be carried out frst by one of these systems prior to a TETRAD II analysis. The Tl7l-D An TT r \mnmm r-7-c nr\+ pir+kmtc. thm v.nmmn~nrc nf ILjLl11 ~‘“~‘cu” U-D ll”L WU‘l‘cLCcr LI1U pcuaJtww.u “1 linear “structural equation models” or provide tests of significance for such models, since these procedures are carried out by a number of commercial packages such as CALIS, LISREL and EQS. 2. Graphical Models. Many statistical models that are given by equations and distribution assumptions can be described more vividly but equally precisely by simple directed graphs. A directed edge X -> Y indicates both that X influences Y and that Y is a function of at least X. For example, suppose we consider a regression model for Y with regressors Xl,...,X4. The model m ight be given by an equation and a distribution claim : all variables are jointly normally distributed, each variable in the set {Xl ,..., X4, E} is independent of the other variables in the set and E has mean zero. The statistical model has a number of firee ~~~~ . Ll..A ..--.-A L-dL--c’--l L-AL_ -ICdC -FL--. parameters cnilI mubt DC: esumdwu m m UIC: udtd. rnoy include the numerical values of the coefficients, al, q, etc., the variance of E and the means and variances of Xl,...,X4. We could equally describe the model by saying that the variables are jointly normal and giving the picture:

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Counterfactuals, graphical causal models and potential outcomes: Response to Lindquist and Sobel

Lindquist and Sobel claim that the graphical causal models they call "agnostic" do not imply any counterfactual conditionals. They doubt that "causal effects" can be discovered using graphical causal models typical of SEMs, DCMs, Bayes nets, Granger causal models, etc. Each of these claims is false or exaggerated. They recommend instead that investigators adopt the "potential outcomes" framewor...

متن کامل

Simulation studies of the reliability of computer aided model specification using the TETRAD, EQS and LISREL programs

TETRAD n, a fully automated successor to the TETRAD program, is intended to aid in the respecification of underspecified linear causal models, or structural equation models. The performance of TETRAD II is compared with the automatic respecification procedures in the EQS and LISREL VI programs using 360 simulated data sets from nine different linear models containing "latent" or unmeasured vari...

متن کامل

Causal Discovery via MML

Automating the learning of causal models from sample data is a key step toward incorporating machine learning into decisionmaking and reasoning under uncertainty. This paper presents a Bayesian approach to the discovery of causal models, using a Minimum Message Length (MML) method. We have developed encoding and search methods for discovering linear causal models. The initial experimental resul...

متن کامل

Learning Measurement Models for Unobserved Variables

Observed associations in a database may be due in whole or part to variations in un­ recorded ("latent") variables. Identifying such variables and their causal relationships with one another is a principal goal in many scientific and practical domains. Previous work shows that, given a partition of ob­ served variables such that members of a class share only a single latent common cause, standa...

متن کامل

A theory of causal learning in children: causal maps and Bayes nets.

The authors outline a cognitive and computational account of causal learning in children. They propose that children use specialized cognitive systems that allow them to recover an accurate "causal map" of the world: an abstract, coherent, learned representation of the causal relations among events. This kind of knowledge can be perspicuously understood in terms of the formalism of directed gra...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1995